Covariate dimension reduction for survival data via the Gaussian process latent variable model.
نویسندگان
چکیده
The analysis of high-dimensional survival data is challenging, primarily owing to the problem of overfitting, which occurs when spurious relationships are inferred from data that subsequently fail to exist in test data. Here, we propose a novel method of extracting a low-dimensional representation of covariates in survival data by combining the popular Gaussian process latent variable model with a Weibull proportional hazards model. The combined model offers a flexible non-linear probabilistic method of detecting and extracting any intrinsic low-dimensional structure from high-dimensional data. By reducing the covariate dimension, we aim to diminish the risk of overfitting and increase the robustness and accuracy with which we infer relationships between covariates and survival outcomes. In addition, we can simultaneously combine information from multiple data sources by expressing multiple datasets in terms of the same low-dimensional space. We present results from several simulation studies that illustrate a reduction in overfitting and an increase in predictive performance, as well as successful detection of intrinsic dimensionality. We provide evidence that it is advantageous to combine dimensionality reduction with survival outcomes rather than performing unsupervised dimensionality reduction on its own. Finally, we use our model to analyse experimental gene expression data and detect and extract a low-dimensional representation that allows us to distinguish high-risk and low-risk groups with superior accuracy compared with doing regression on the original high-dimensional data.
منابع مشابه
Stratification of patient trajectories using covariate latent variable models
Standard models assign disease progression to discrete categories or stages based on wellcharacterized clinical markers. However, such a system is potentially at odds with our understanding of the underlying biology, which in highly complex systems may support a (near-)continuous evolution of disease from inception to terminal state. To learn such a continuous disease score one could infer a la...
متن کاملDependent Indian Buffet Processes
Latent variable models represent hidden structure in observational data. To account for the distribution of the observational data changing over time, space or some other covariate, we need generalizations of latent variable models that explicitly capture this dependency on the covariate. A variety of such generalizations has been proposed for latent variable models based on the Dirichlet proce...
متن کاملBayesian Analysis of Survival Data with Spatial Correlation
Often in practice the data on the mortality of a living unit correlation is due to the location of the observations in the study. One of the most important issues in the analysis of survival data with spatial dependence, is estimation of the parameters and prediction of the unknown values in known sites based on observations vector. In this paper to analyze this type of survival, Cox...
متن کاملGaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling
Time series data of high dimensions are frequently encountered in fields like robotics, computer vision, economics and motion capture. In this survey paper we look first at Gaussian Process Latent Variable Model (GPLVM) which is a probabilistic nonlinear dimensionality reduction method. Further we discuss Gaussian Process Dynamical Model (GPDMs) which are based GPLVM. GPDM is a probabilistic ap...
متن کاملMonocular Tracking 3D People with Back Constrained Scaled Gaussian Process Latent Variable Models
Tracking 3D people from monocular video is often poorly constrained. To mitigate this problem, prior information can be exploited. In learning the prior stage, most algorithms think representing high-dimensional pose space in low-dimensional space as dimension reduction procedure, without considering the geometrical relation or time correlation in pose space. Therefore, the prior loses physical...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistics in medicine
دوره 35 8 شماره
صفحات -
تاریخ انتشار 2016